Background¶
Observing vegetation health from space¶
We will look at vegetation health using NDVI (Normalized Difference Vegetation Index). How does it work? First, we need to learn about spectral reflectance signatures.
Every object reflects some wavelengths of light more or less than others. We can see this with our eyes, since, for example, plants reflect a lot of green in the summer, and then as that green diminishes in the fall they look more yellow or orange. The image below shows spectral signatures for water, soil, and vegetation:
> Image source: SEOS
Project
Healthy vegetation reflects a lot of Near-InfraRed (NIR) radiation. Less healthy vegetation reflects a similar amounts of the visible light spectra, but less NIR radiation. We don’t see a huge drop in Green radiation until the plant is very stressed or dead. That means that NIR allows us to get ahead of what we can see with our eyes.
Different species of plants reflect different spectral signatures, but the pattern of the signatures across species and sitations is similar. NDVI compares the amount of NIR reflectance to the amount of Red reflectance, thus accounting for many of the species differences and isolating the health of the plant. The formula for calculating NDVI is:
$$NDVI = \frac{(NIR - Red)}{(NIR + Red)}$$
Read more about NDVI and other vegetation indices:
id = 'stars'
site_name = 'Gila River Indian Community'
project_name = 'Gila River Vegetation'
boundary_dir = 'tl_2020_us_aitsn'
event = 'water rights case'
start_year = '2001'
end_year = '2022'
event_year = '2012'
00: Setup¶
Import Libraries and Download Sample Data¶
import json
import os
import pathlib
from glob import glob #C Combine data arrays
import earthpy # Manage local data
import geopandas as gpd # Work with geospatial vector data
import pandas as pd # Work with tabular data
import hvplot.pandas
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point
import hvplot.xarray
import rioxarray as rxr # To work with raster data
import xarray as xr # To work with data arrays and raster data
project = earthpy.Project(
'Gila River Vegetation', dirname='Vegetation')
project.get_data()
01: Create Site Map¶
Study Area: Gila River Indian Community¶
Earth Data Science data formats¶
In Earth Data Science, we get data in three main formats:
| Data type | Descriptions | Common file formats | Python type |
|---|---|---|---|
| Time Series | The same data points (e.g. streamflow) collected multiple times over time | Tabular formats (e.g. .csv, or .xlsx) | pandas DataFrame |
| Vector | Points, lines, and areas (with coordinates) | Shapefile (often an archive like a .zip file because a Shapefile is actually a collection of at least 3 files) |
geopandas GeoDataFrame |
| Raster | Evenly spaced spatial grid (with coordinates) | GeoTIFF (.tif), NetCDF (.nc), HDF (.hdf) |
rioxarray DataArray |
about vector data and raster data in the textbook.
Reflect and Respond¶
- For this coding challenge, we are interested in the Gila River Indian Community (GRIC) boundary and the health of the vegetation in the area measure on a scale from -1 to 1.
What type of data do you think the boundary will be?
- The data for the boundary will probably be a polygon data type.
What type of data do you think the vegetation health will be?
- The vegetation health will probably be an object (variable) data type.
Load the Gila River Indian Community boundary¶
# Load in the boundary data
aitsn_gdf = gpd.read_file(project.project_dir / 'tl_2020_us_aitsn')
# Check that it worked
aitsn_gdf.head()
| AIANNHCE | TRSUBCE | TRSUBNS | GEOID | NAME | NAMELSAD | LSAD | CLASSFP | MTFCC | FUNCSTAT | ALAND | AWATER | INTPTLAT | INTPTLON | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2430 | 653 | 02419073 | 2430653 | Red Valley | Red Valley Chapter | T2 | D7 | G2300 | A | 922036695 | 195247 | +36.6294607 | -109.0550394 | POLYGON ((-109.2827 36.64644, -109.28181 36.65... |
| 1 | 2430 | 665 | 02419077 | 2430665 | Rock Point | Rock Point Chapter | T2 | D7 | G2300 | A | 720360268 | 88806 | +36.6598701 | -109.6166836 | POLYGON ((-109.85922 36.49859, -109.85521 36.5... |
| 2 | 2430 | 675 | 02419081 | 2430675 | Rough Rock | Rough Rock Chapter | T2 | D7 | G2300 | A | 364475668 | 216144 | +36.3976971 | -109.7695183 | POLYGON ((-109.93053 36.40672, -109.92923 36.4... |
| 3 | 2430 | 325 | 02418975 | 2430325 | Indian Wells | Indian Wells Chapter | T2 | D7 | G2300 | A | 717835323 | 133795 | +35.3248534 | -110.0855000 | POLYGON ((-110.24222 35.36327, -110.24215 35.3... |
| 4 | 2430 | 355 | 02418983 | 2430355 | Kayenta | Kayenta Chapter | T2 | D7 | G2300 | A | 1419241065 | 1982848 | +36.6884391 | -110.3045616 | POLYGON ((-110.56817 36.73489, -110.56603 36.7... |
# Make interactive plot to view data and find site location
aitsn_gdf.hvplot(
geo=True, tiles='EsriImagery',
frame_width=500,
legend=False, fill_color=None, edge_color='white',
# This parameter makes all the column values in the dataset visible.
hover_cols='all')
WARNING:param.main: edge_color option not found for polygons plot with bokeh; similar options include: ['muted_color', 'bgcolor', 'line_color']
Reflect and Respond¶
What column could you use to uniquely identify the subdivisions of the reservation you want to study using this interactive map? What value do you need to use to filter the GeoDataFrame?¶
- The column AIANNHCE is the identifier for the tribal subdivision
- Looking for Phoenix, Arizona at lat,lon 33 N, 112 W in the interactive map, we find AIANNHCE is number 1310
# Check what type of data the AIANNHCE column is
aitsn_gdf.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 484 entries, 0 to 483 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AIANNHCE 484 non-null object 1 TRSUBCE 484 non-null object 2 TRSUBNS 484 non-null object 3 GEOID 484 non-null object 4 NAME 484 non-null object 5 NAMELSAD 484 non-null object 6 LSAD 484 non-null object 7 CLASSFP 484 non-null object 8 MTFCC 484 non-null object 9 FUNCSTAT 484 non-null object 10 ALAND 484 non-null int64 11 AWATER 484 non-null int64 12 INTPTLAT 484 non-null object 13 INTPTLON 484 non-null object 14 geometry 484 non-null geometry dtypes: geometry(1), int64(2), object(12) memory usage: 56.8+ KB
# Select and merge the Gila River subdivisions
gila_gdf = aitsn_gdf.loc[aitsn_gdf.AIANNHCE=='1310'].dissolve()
# Plot the results with web tile images
gila_gdf.hvplot(
geo=True, tiles='EsriImagery',
fill_color=None, line_color='black',
title='Gila River Indian Community',
frame_width=500)
%store project gila_gdf aitsn_gdf
Stored 'project' (Project) Stored 'gila_gdf' (GeoDataFrame) Stored 'aitsn_gdf' (GeoDataFrame)
02: Wrangle Data¶
Load in NDVI data¶
- In this section, we need to extract the date from the filename. We will use the glob.rglob() function in Python to find all files that match a pattern.
Reflect and Respond¶
Take a look at the file names for the NDVI files. What do you notice is the same for all the files?
Keep in mind that you only want the NDVI files, not the Quality files (which will flag potential incorrect measurements)
- Looking in the gila-river-ndvi folder, we can see that all the file names all include NDVI and are of type .tif, so we can use the wildcard character (), for files 'NDVI*.tif'
# Get a sorted list of NDVI tif file paths
ndvi_paths = sorted(list(project.project_dir.rglob('*NDVI*.tif')))
# See how many files are in this list
print (len(ndvi_paths))
# Display the first and last three files paths to check the pattern
(ndvi_paths[:3], ndvi_paths[-3:])
154
([PosixPath('/workspaces/data/vegetation/gila-river-ndvi/MOD13Q1.061_2001137_to_2022244/MOD13Q1.061__250m_16_days_NDVI_doy2001145000000_aid0001.tif'),
PosixPath('/workspaces/data/vegetation/gila-river-ndvi/MOD13Q1.061_2001137_to_2022244/MOD13Q1.061__250m_16_days_NDVI_doy2001161000000_aid0001.tif'),
PosixPath('/workspaces/data/vegetation/gila-river-ndvi/MOD13Q1.061_2001137_to_2022244/MOD13Q1.061__250m_16_days_NDVI_doy2001177000000_aid0001.tif')],
[PosixPath('/workspaces/data/vegetation/gila-river-ndvi/MOD13Q1.061_2001137_to_2022244/MOD13Q1.061__250m_16_days_NDVI_doy2022209000000_aid0001.tif'),
PosixPath('/workspaces/data/vegetation/gila-river-ndvi/MOD13Q1.061_2001137_to_2022244/MOD13Q1.061__250m_16_days_NDVI_doy2022225000000_aid0001.tif'),
PosixPath('/workspaces/data/vegetation/gila-river-ndvi/MOD13Q1.061_2001137_to_2022244/MOD13Q1.061__250m_16_days_NDVI_doy2022241000000_aid0001.tif')])
# Extract information from the data sets
doy_start = -25
doy_end = -19
# Loop through each NDVI image
ndvi_das = [] # Create an empty list of data arrays
for ndvi_path in ndvi_paths: # ndvi_path equivalent to i
# Get date from file name
doy = ndvi_path.name[doy_start:doy_end]
date = pd.to_datetime(doy, format='%Y%j') # Use pandas to change doy to date
# Open dataset
da = rxr.open_rasterio(ndvi_path, mask_and_scale=True).squeeze() # Create data array dataset
# Add date dimension and clean up metadata
da = da.assign_coords({'date': date})
da = da.expand_dims({'date': 1})
da.name = 'NDVI'
# Prepare for concatenation
ndvi_das.append(da) # Add the data array in each loop to the list of NDVI arrays
# Return length of list
len(ndvi_das)
154
# Combine NDVI images (Raster data) from all dates
ndvi_da = xr.combine_by_coords(ndvi_das, coords=['date'])
ndvi_da
/tmp/ipykernel_2734/570708293.py:2: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly. ndvi_da = xr.combine_by_coords(ndvi_das, coords=['date']) /tmp/ipykernel_2734/570708293.py:2: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly. ndvi_da = xr.combine_by_coords(ndvi_das, coords=['date'])
<xarray.Dataset> Size: 48MB
Dimensions: (date: 154, y: 203, x: 382)
Coordinates:
band int64 8B 1
* x (x) float64 3kB -112.3 -112.3 -112.3 ... -111.5 -111.5 -111.5
* y (y) float64 2kB 33.39 33.39 33.38 33.38 ... 32.97 32.97 32.97
spatial_ref int64 8B 0
* date (date) datetime64[ns] 1kB 2001-01-14 2001-01-16 ... 2022-01-24
Data variables:
NDVI (date, y, x) float32 48MB 0.8282 0.6146 ... 0.2146 0.2085%store ndvi_da ndvi_das ndvi_paths
Stored 'ndvi_da' (Dataset) Stored 'ndvi_das' (list) Stored 'ndvi_paths' (list)
03: Plot Data¶
# Plot first .tif image
rxr.open_rasterio(ndvi_paths[0], mask_and_scale=True).squeeze().plot()
<matplotlib.collections.QuadMesh at 0x789d9590da50>
# Plot the last .tif image
rxr.open_rasterio(ndvi_paths[-1], mask_and_scale=True).squeeze().plot()
<matplotlib.collections.QuadMesh at 0x789d954794d0>
# Plot first and last image side-by-side for comparison
recent_ndvi = rxr.open_rasterio(ndvi_paths[-1], mask_and_scale=True).squeeze()
old_ndvi = rxr.open_rasterio(ndvi_paths[0], mask_and_scale=True).squeeze()
# Create side-by-side subplots
fig, axes = plt.subplots(1, 2, figsize=(12,4)) # 1 column and 2 rows
# Plot each in their own axes
old_ndvi.plot(ax=axes[0], cmap=plt.cm.PiYG, vmin=-1, vmax=1) # Set axes between -1 and 1
axes[0].set_title("NDVI - Gila River 2001")
gila_gdf.plot(ax=axes[0], edgecolor='black', facecolor='none',linewidth=1)
recent_ndvi.plot(ax=axes[1], cmap=plt.cm.PiYG, vmin=-1, vmax=1, ) # Set axes between -1 and 1
axes[1].set_title("NDVI - Gila River 2022")
gila_gdf.plot(ax=axes[1], edgecolor='black', facecolor='none',linewidth=1)
<Axes: title={'center': 'NDVI - Gila River 2022'}, xlabel='x', ylabel='y'>
Plot NDVI data differences¶
# Compute the difference in NDVI before and after the return of eater rights in 2004
ndvi_diff = (
ndvi_da
.sel(date=slice('2012', '2022')) # Choose the dates between 2012-2022
.mean('date') # Calculate the mean based on the date
.NDVI
- ndvi_da
.sel(date=slice('2001', '2011')) # Choose the dates between 2001-2011
.mean('date')
.NDVI
)
ndvi_diff
#ndvi_diff.plot()
<xarray.DataArray 'NDVI' (y: 203, x: 382)> Size: 310kB
array([[-0.05567682, -0.0292117 , 0.00586349, ..., 0.01543377,
0.01543377, 0.00927271],
[-0.07940263, -0.03390124, -0.02959213, ..., 0.01815718,
0.01815718, 0.0177182 ],
[-0.17723629, -0.08530393, 0.01360923, ..., 0.01517531,
0.00823637, 0.01195324],
...,
[-0.0115844 , -0.0115844 , -0.00991558, ..., -0.00157142,
-0.00157142, 0.00205326],
[-0.01115062, -0.01115062, -0.00994415, ..., 0.00598571,
0.00598571, -0.00095583],
[-0.00930774, -0.00849222, -0.01209998, ..., -0.0296714 ,
-0.02090381, -0.03258313]], shape=(203, 382), dtype=float32)
Coordinates:
band int64 8B 1
* x (x) float64 3kB -112.3 -112.3 -112.3 ... -111.5 -111.5 -111.5
* y (y) float64 2kB 33.39 33.39 33.38 33.38 ... 32.97 32.97 32.97
spatial_ref int64 8B 0# Plot the difference
(
ndvi_diff.hvplot(x='x', y='y', cmap='PiYG', geo=True,
title="NDVI Changes at Gila River Indian Commmunity\n2012-2022 vs. 2001-2011")
*
gila_gdf.hvplot(geo=True, fill_color=None, line_color='black')
)
Conclusion¶
In the figure above, green (a positive score in the differences of NDVI) shows that land got healthier. The pink shows how land got less healthy over time. In the figure we can see that a lot of land in the area outside the Gila River Indian Community got worse over the years, but that has not occurred as much within the GRIC boundary.
These results seem to indicate that when water rights were returned to the indigenous community, there was improvements in the overall health of the land.
Research (https://www4.evergreen.edu/sites/default/files/When_Our_Water_returns_10-25-09.pdf) has shown that when the land was not under indigienous ownership, the river was running dry and there was famine and deprivation within the Indian community. This research aligns with the results we see in the above image.
We can also see in the image, significant dark areas of green where there appear to be agricultural crop growths, indicating imrpovement in local farming in the area.
04: Summarize Results¶
STEP 4: Is the NDVI different within the Gila River Indian Community after the return of water rights?¶
In this section, we compute the mean NDVI inside and outside GRIC boundary.
- Check the variable names - Make sure that the code uses your
boundary
GeoDataFrame - How could you test if the geometry was modified correctly? Add some code to take a look at the results.
# Compute the area outside the Gila River Indian Community Boundary
outside_gila_gdf = (
gpd.GeoDataFrame(geometry=gila_gdf.envelope)
.overlay(gila_gdf, how='difference') #Geometry of the Gila River Community's envelope (surrounding area)
)
outside_gila_gdf
| geometry | |
|---|---|
| 0 | MULTIPOLYGON (((-112.30875 32.96704, -112.3087... |
# Check that we collected outside boundary
outside_gila_gdf.plot()
<Axes: >
# Clip data to both inside and outside the boundary
ndvi_inside = ndvi_da.rio.clip(gila_gdf.geometry, from_disk=True)
print(ndvi_inside)
ndvi_outside = ndvi_da.rio.clip(outside_gila_gdf.geometry, from_disk=True)
print(ndvi_outside)
<xarray.Dataset> Size: 47MB
Dimensions: (x: 379, y: 202, date: 154)
Coordinates:
band int64 8B 1
* x (x) float64 3kB -112.3 -112.3 -112.3 ... -111.5 -111.5 -111.5
* y (y) float64 2kB 33.39 33.38 33.38 33.38 ... 32.97 32.97 32.97
* date (date) datetime64[ns] 1kB 2001-01-14 2001-01-16 ... 2022-01-24
spatial_ref int64 8B 0
Data variables:
NDVI (date, y, x) float32 47MB nan nan nan nan ... nan nan nan nan
<xarray.Dataset> Size: 48MB
Dimensions: (x: 380, y: 203, date: 154)
Coordinates:
band int64 8B 1
* x (x) float64 3kB -112.3 -112.3 -112.3 ... -111.5 -111.5 -111.5
* y (y) float64 2kB 33.39 33.39 33.38 33.38 ... 32.97 32.97 32.97
* date (date) datetime64[ns] 1kB 2001-01-14 2001-01-16 ... 2022-01-24
spatial_ref int64 8B 0
Data variables:
NDVI (date, y, x) float32 48MB 0.6146 0.3796 ... 0.1736 0.2146
# Compute mean annual July NDVI
july_ndvi_inside_df = (ndvi_inside
.groupby(ndvi_inside.date.dt.year)
.mean(...)
.NDVI.to_dataframe())
print('Mean NDVI inside Gila Community \n', july_ndvi_inside_df.head())
july_ndvi_outside_df = (ndvi_outside
.groupby(ndvi_outside.date.dt.year)
.mean(...)
.NDVI.to_dataframe())
print('Mean NDVI outside Gila Community \n', july_ndvi_outside_df.head())
Mean NDVI inside Gila Community
band spatial_ref NDVI
year
2001 1 0 0.199645
2002 1 0 0.177933
2003 1 0 0.187302
2004 1 0 0.176162
2005 1 0 0.238630
Mean NDVI outside Gila Community
band spatial_ref NDVI
year
2001 1 0 0.247629
2002 1 0 0.226726
2003 1 0 0.229889
2004 1 0 0.221753
2005 1 0 0.255275
Mean NDVI outside Gila Community
band spatial_ref NDVI
year
2001 1 0 0.247629
2002 1 0 0.226726
2003 1 0 0.229889
2004 1 0 0.221753
2005 1 0 0.255275
# Join inside and outside data frame
july_ndvi_df = (july_ndvi_inside_df[['NDVI']]
.join(july_ndvi_outside_df[['NDVI']],
lsuffix=' Inside Gila Community',
rsuffix=' Outside Gila Community')) #left is one you start with
print(july_ndvi_df)
NDVI Inside Gila Community NDVI Outside Gila Community year 2001 0.199645 0.247629 2002 0.177933 0.226726 2003 0.187302 0.229889 2004 0.176162 0.221753 2005 0.238630 0.255275 2006 0.211491 0.235571 2007 0.181710 0.211984 2008 0.201902 0.237739 2009 0.179118 0.218907 2010 0.201366 0.235849 2011 0.173018 0.217433 2012 0.180081 0.222250 2013 0.181592 0.223138 2014 0.192506 0.224528 2015 0.200857 0.227473 2016 0.179837 0.213712 2017 0.195767 0.225505 2018 0.178278 0.210275 2019 0.212160 0.227949 2020 0.205625 0.224280 2021 0.223103 0.237178 2022 0.197994 0.212205
# Plot mean NDVI inside and outside the Gila River Indian Community boundary
july_ndvi_df.hvplot(title='Mean July NDVI Inside and Outside Gila River Indian Community')
# Add a new column that takes the difference between inside and outside boundaries
july_ndvi_df['Difference'] = (july_ndvi_df['NDVI Outside Gila Community']
- july_ndvi_df['NDVI Inside Gila Community'])
july_ndvi_df
# Difference is getting less negative, getting closer to zero, getting closer to the same
| NDVI Inside Gila Community | NDVI Outside Gila Community | Difference | |
|---|---|---|---|
| year | |||
| 2001 | 0.199645 | 0.247629 | 0.047984 |
| 2002 | 0.177933 | 0.226726 | 0.048793 |
| 2003 | 0.187302 | 0.229889 | 0.042588 |
| 2004 | 0.176162 | 0.221753 | 0.045591 |
| 2005 | 0.238630 | 0.255275 | 0.016645 |
| 2006 | 0.211491 | 0.235571 | 0.024079 |
| 2007 | 0.181710 | 0.211984 | 0.030274 |
| 2008 | 0.201902 | 0.237739 | 0.035836 |
| 2009 | 0.179118 | 0.218907 | 0.039790 |
| 2010 | 0.201366 | 0.235849 | 0.034484 |
| 2011 | 0.173018 | 0.217433 | 0.044415 |
| 2012 | 0.180081 | 0.222250 | 0.042169 |
| 2013 | 0.181592 | 0.223138 | 0.041546 |
| 2014 | 0.192506 | 0.224528 | 0.032022 |
| 2015 | 0.200857 | 0.227473 | 0.026616 |
| 2016 | 0.179837 | 0.213712 | 0.033875 |
| 2017 | 0.195767 | 0.225505 | 0.029738 |
| 2018 | 0.178278 | 0.210275 | 0.031997 |
| 2019 | 0.212160 | 0.227949 | 0.015789 |
| 2020 | 0.205625 | 0.224280 | 0.018655 |
| 2021 | 0.223103 | 0.237178 | 0.014075 |
| 2022 | 0.197994 | 0.212205 | 0.014210 |
# Plot the NDVI difference inside and outside the boundary
july_ndvi_df.Difference.hvplot(title='Difference in NDVI within and outside the Gila River Indian Community')
%store july_ndvi_df july_ndvi_inside_df july_ndvi_outside_df
Stored 'july_ndvi_df' (DataFrame) Stored 'july_ndvi_inside_df' (DataFrame) Stored 'july_ndvi_outside_df' (DataFrame)